8 research outputs found

    Do Deep Neural Networks Suffer from Crowding?

    Get PDF
    Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks for object recognition. We analyze both standard deep convolutional neural networks (DCNNs) as well as a new version of DCNNs which is 1) multi-scale and 2) with size of the convolution filters change depending on the eccentricity wrt to the center of fixation. Such networks, that we call eccentricity-dependent, are a computational model of the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating the flankers into the images of the training set does not improve performance with crowding.Comment: CBMM mem

    Do deep neural networks suffer from crowding?

    No full text
    © 2017 Neural information processing systems foundation. All rights reserved. Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks (DNNs) for object recognition. We analyze both deep convolutional neural networks (DCNNs) as well as an extension of DCNNs that are multi-scale and that change the receptive field size of the convolution filters with their position in the image. The latter networks, that we call eccentricity-dependent, have been proposed for modeling the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating flankers into the images of the training set for learning the DNNs does not lead to robustness against configurations not seen at training

    Decomposing Image Generation into Layout Prediction and Conditional Synthesis

    No full text
    Learning the distribution of multi-object scenes with Generative Adversarial Networks (GAN) is challenging. Guiding the learning using semantic intermediate representations, which are less complex than images, can be a solution. In this article, we investigate splitting the optimisation of generative adversarial networks into two parts, by first generating a semantic segmentation mask from noise and then translating that segmentation mask into an image. We performed experiments using images from the CityScapes dataset and compared our approach to Progressive Growing of GANs (PGGAN), which uses multiscale growing of networks to guide the learning. Using the lens of a segmentation algorithm to examine the structure of generated images, we find that our method achieves higher structural consistency in latent space interpolations and yields generations with better differentiation between distinct objects, while achieving the same image quality as PGGAN as judged by a user study and a standard GAN evaluation metric. © 2020 IEEE

    Query-adaptive Video Summarization via Quality-aware Relevance Estimation

    No full text
    © 2017 Copyright held by the owner/author(s). Although the problem of automatic video summarization has recently received a lot of attention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem by posing query-relevant summarization as a video frame subset selection problem, which lets us optimise for summaries which are simultaneously diverse, representative of the entire video, and relevant to a text query. We quantify relevance by measuring the distance between frames and queries in a common textual-visual semantic embedding space induced by a neural network. In addition, we extend the model to capture query-independent properties, such as frame quality. We compare our method against previous state of the art on textual-visual embeddings for thumbnail selection and show that our model outperforms them on relevance prediction. Furthermore, we introduce a new dataset, annotated with diversity and query-specific relevance labels. On this dataset, we train and test our complete model for video summarization and show that it outperforms standard baselines such as Maximal Marginal Relevance.Submitted to ACM Multimedia 2017status: publishe

    Machine Vision for Real-Time Intraoperative Anatomic Guidance: A Proof-of-Concept Study in Endoscopic Pituitary Surgery

    Full text link
    BACKGROUND: Current intraoperative orientation methods either rely on preoperative imaging, are resource-intensive to implement, or difficult to interpret. Real-time, reliable anatomic recognition would constitute another strong pillar on which neurosurgeons could rest for intraoperative orientation. OBJECTIVE: To assess the feasibility of machine vision algorithms to identify anatomic structures using only the endoscopic camera without prior explicit anatomo-topographic knowledge in a proof-of-concept study. METHODS: We developed and validated a deep learning algorithm to detect the nasal septum, the middle turbinate, and the inferior turbinate during endoscopic endonasal approaches based on endoscopy videos from 23 different patients. The model was trained in a weakly supervised manner on 18 and validated on 5 patients. Performance was compared against a baseline consisting of the average positions of the training ground truth labels using a semiquantitative 3-tiered system. RESULTS: We used 367 images extracted from the videos of 18 patients for training, as well as 182 test images extracted from the videos of another 5 patients for testing the fully developed model. The prototype machine vision algorithm was able to identify the 3 endonasal structures qualitatively well. Compared to the baseline model based on location priors, the algorithm demonstrated slightly but statistically significantly (P < .001) improved annotation performance. CONCLUSION: Automated recognition of anatomic structures in endoscopic videos by means of a machine vision model using only the endoscopic camera without prior explicit anatomo-topographic knowledge is feasible. This proof of concept encourages further development of fully automated software for real-time intraoperative anatomic guidance during surgery

    Facial attractiveness of cleft patients: a direct comparison between artificial-intelligence-based scoring and conventional rater groups

    No full text
    OBJECTIVES: To evaluate facial attractiveness of treated cleft patients and controls by artificial intelligence (AI) and to compare these results with panel ratings performed by laypeople, orthodontists, and oral surgeons. MATERIALS AND METHODS: Frontal and profile images of 20 treated left-sided cleft patients (10 males, mean age: 20.5 years) and 10 controls (5 males, mean age: 22.1 years) were evaluated for facial attractiveness with dedicated convolutional neural networks trained on >17 million ratings for attractiveness and compared to the assessments of 15 laypeople, 14 orthodontists, and 10 oral surgeons performed on a visual analogue scale (n = 2323 scorings). RESULTS: AI evaluation of cleft patients (mean score: 4.75 ± 1.27) was comparable to human ratings (laypeople: 4.24 ± 0.81, orthodontists: 4.82 ± 0.94, oral surgeons: 4.74 ± 0.83) and was not statistically different (all Ps ≥ 0.19). Facial attractiveness of controls was rated significantly higher by humans than AI (all Ps ≤ 0.02), which yielded lower scores than in cleft subjects. Variance was considerably large in all human rating groups when considering cases separately, and especially accentuated in the assessment of cleft patients (coefficient of variance-laypeople: 38.73 ± 9.64, orthodontists: 32.56 ± 8.21, oral surgeons: 42.19 ± 9.80). CONCLUSIONS: AI-based results were comparable with the average scores of cleft patients seen in all three rating groups (with especially strong agreement to both professional panels) but overall lower for control cases. The variance observed in panel ratings revealed a large imprecision based on a problematic absence of unity. IMPLICATION: Current panel-based evaluations of facial attractiveness suffer from dispersion-related issues and remain practically unavailable for patients. AI could become a helpful tool to describe facial attractiveness, but the present results indicate that important adjustments are needed on AI models, to improve the interpretation of the impact of cleft features on facial attractiveness.status: publishe

    Facial attractiveness of cleft patients: a direct comparison between artificial-intelligence-based scoring and conventional rater groups

    No full text
    Objectives: To evaluate facial attractiveness of treated cleft patients and controls by artificial intelligence (AI) and to compare these results with panel ratings performed by laypeople, orthodontists, and oral surgeons. Materials and methods: Frontal and profile images of 20 treated left-sided cleft patients (10 males, mean age: 20.5 years) and 10 controls (5 males, mean age: 22.1 years) were evaluated for facial attractiveness with dedicated convolutional neural networks trained on >17 million ratings for attractiveness and compared to the assessments of 15 laypeople, 14 orthodontists, and 10 oral surgeons performed on a visual analogue scale (n = 2323 scorings). Results: AI evaluation of cleft patients (mean score: 4.75 ± 1.27) was comparable to human ratings (laypeople: 4.24 ± 0.81, orthodontists: 4.82 ± 0.94, oral surgeons: 4.74 ± 0.83) and was not statistically different (all Ps ≥ 0.19). Facial attractiveness of controls was rated significantly higher by humans than AI (all Ps ≤ 0.02), which yielded lower scores than in cleft subjects. Variance was considerably large in all human rating groups when considering cases separately, and especially accentuated in the assessment of cleft patients (coefficient of variance—laypeople: 38.73 ± 9.64, orthodontists: 32.56 ± 8.21, oral surgeons: 42.19 ± 9.80). Conclusions: AI-based results were comparable with the average scores of cleft patients seen in all three rating groups (with especially strong agreement to both professional panels) but overall lower for control cases. The variance observed in panel ratings revealed a large imprecision based on a problematic absence of unity. Implication: Current panel-based evaluations of facial attractiveness suffer from dispersion-related issues and remain practically unavailable for patients. AI could become a helpful tool to describe facial attractiveness, but the present results indicate that important adjustments are needed on AI models, to improve the interpretation of the impact of cleft features on facial attractiveness.ISSN:0141-5387ISSN:1460-221
    corecore